Author Details

Scroll

Refine your search

Collections

Engineering Collection

Co-Authors

Journals

Data Mining and Knowledge Engineering

Year

2011
2010

Authors

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All

Thakkar, Amit

Learning Using Heterogeneous Classifier in Data Mining

Improved K-Means with Dimensionality Reduction Technique

Abstract Views :183 | PDF Views:3

Authors

Amit Thakkar ¹, Nikita Bhatt ¹, Amit Ganatra ¹, Arpita Shah ¹

Affiliations
1 Charotar Institute of Technology Changa, Nadiad, Gujarat, IN

Source

Data Mining and Knowledge Engineering, Vol 3, No 12 (2011), Pagination: 722-725

Abstract

Clustering is the process of finding groups of objects such that the objects in a group will be similar to one another and different from the objects in other groups. K-means is a well known partitioning based clustering technique that attempts to find a user specified number of clusters represented by their centroid. K-means clustering algorithm often does not work well for high dimension; hence, to improve the efficiency, we apply PCA, dimensionality reduction technique, on data set and obtain a reduced dataset containing possibly uncorrelated variables. The challenging task for any clustering method is to determine the number of clusters beforehand. To find the number of cluster, we apply EM method that finds number of clusters user should choose by determining a mixture of Gaussians that fit a given data set. Finally the experiment results shows that the use of techniques such as PCA and EM, improve the efficiency of K-means clustering.

Keywords

Cluster, EM, K-Mean, PCA.

Full Text

Comprehensive and Evolution Study Focusing Future Research Challenges in the Field of Multi Relational Data Mining Specific to Multi-Relational Classification Approaches

Abstract Views :216 | PDF Views:2

Authors

Amit Thakkar ¹, Y. P. Kosta ²

Affiliations
1 Chandubhai S. Patel Institute of Technology, Changa, Gujarat, IN
2 Marwadi Group of Institutions, Rajkot, Gujarat, IN

Source

Data Mining and Knowledge Engineering, Vol 3, No 10 (2011), Pagination: 594-598

Abstract

Most of today’s structured data is stored in relational databases. Thus, the task of learning from relational data has begun to receive significant attention in the literature. Unfortunately, most methods only utilize “flat” data representations. Thus, to apply these single-table data mining techniques, we are forced to incur a computational penalty by first converting the data into this “flat” form. As a result of this transformation, the data not only loses its compact representation but the semantic information present in the relations are reduced or eliminated. As an important task of multi-relational data mining, multi-relational classification can directly look for patterns that involve multiple relations from a relational database and have more advantages than propositional data mining approaches. According to the differences in knowledge representation and strategy, the paper addressed different kind of multi-relational classification approaches that are ILP-based, graph-based and relational database-based classification approaches and discussed each relational classification technology, their characteristics, the comparisons and several challenging researching problems in detail.

Keywords

Multi-Relational Data Mining, Multi-Relational Classification, Inductive Logic Programming (ILP), Graph, Selection Graph, Tuple ID Propagation.

Full Text

Classification using Generalization Based Decision Tree Induction along with Relevance Analysis Based on Relational Database

Abstract Views :199 | PDF Views:3

Authors

Amit Thakkar ¹, Yogeshwar P. Kosta ², Amit Ganatra ²

Affiliations
1 Charotar Institute of Technology Changa, Gujarat, IN
2 Charotar Institute of Technology, Changa, Gujarat, IN

Source

Data Mining and Knowledge Engineering, Vol 2, No 10 (2010), Pagination: 287-293

Abstract

Classification is a process of sorting unknown values of certain attributes-of-interest based on the values of other attributes, and is a major challenge in data mining. A commonly used method is the decision tree. The efficiency of decision tree algorithms has been well established for relatively small data sets. However, this method of classification has problems when handling larger data sets, data having continuous numerical values, and has the tendency to favor multiplicity in terms of values associated with the attributes in the data set while making selection of the final determining attribute. In data mining applications, large training sets are common; therefore decision tree algorithms have limitations of scalability. Also in most data mining application, users have a little knowledge regarding which signature attribute should be selected for effective mining and the user is more dependent upon the capability of the algorithm. In this paper, we address selection of two things, one, the right signature attribute and the second, handle large data set. This we accomplish by proposing a new data classification method through integration of a set of sequential process that involves steps such as data cleaning; attribute oriented induction (identifying the signature attribute), relevance analysis as the preprocessing steps followed by induction of decision trees. This stepwise approach helps us to set simple extraction rules at multiple levels of abstraction and easily handles large data sets and continuous numerical values in a scalable way.

Keywords

Data Mining, Classification, Data Cleaning, Decision Tree Induction, Relevance Analysis.

Username
Password
Remember me